SPARK-7237 Clean function in several RDD methods#5959
SPARK-7237 Clean function in several RDD methods#5959ted-yu wants to merge 12 commits intoapache:masterfrom ted-yu:master
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
my understanding is that mappartitionsWithIndex cleans, and thus anything in that is cleaned. maybe it's an incorrect assumption. cc @andrewor14?
There was a problem hiding this comment.
Looks like Reynold is correct.
I can update the PR for collect() and undo the change for other methods touched.
There was a problem hiding this comment.
I think collect might've been cleaned in DAGScheduler's runJob. Double check that.
There was a problem hiding this comment.
I believe that @andrewor14 added some tests for closure cleaning as part of his recent ClosureCleaner patch; we might check whether that test suite covers these methods.
There was a problem hiding this comment.
The test suite does not currently cover these methods because they are deprecated, but maybe we should just add them. (@JoshRosen is referring to ClosureCleanerSuite)
There was a problem hiding this comment.
@rxin @ted-yu actually even though mapPartitionsWithIndex does clean already, it cleans the whole closure but not the ones used in the closure. In this case, I believe it's actually necessary to clean f here since we won't actually clean it from mapPartitionsWithIndex. For the same reason I believe we also need to clean constructA since it's a closure provided by the user.
|
Hey @ted-yu would you mind adding |
|
ok to test |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32138 has started for PR 5959 at commit |
|
@ted-yu Also it would be good to add test cases for the particular methods you are testing in |
|
Test build #32138 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32150 has started for PR 5959 at commit |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32151 has started for PR 5959 at commit |
There was a problem hiding this comment.
note that the tests will fail because these closures are expected to have return statements in them. Can you model these closures after the other ones in this test?
|
Hi @ted-yu I believe this also needs to cover |
|
Test build #32151 has finished for PR 5959 at commit
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Test build #32150 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test build #32236 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
@rxin @andrewor14 |
There was a problem hiding this comment.
why is this commented out?
|
From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32195/consoleFull : [info] - filterWith *** FAILED *** (13 milliseconds) |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32275 has started for PR 5959 at commit |
There was a problem hiding this comment.
@ted-yu I investigated this a little and this won't work because you're bringing in the whole SparkContext into the closure. Instead we need to clean the closures outside of mapPartitionsWithIndex as you have done in other methods. There is nothing special about filterWith.
There was a problem hiding this comment.
once you have done that please uncomment the tests
|
Should have looked closer :-) |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32279 has started for PR 5959 at commit |
|
Test build #32275 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Great, merging this into master and 1.4. Thanks @ted-yu. |
Author: tedyu <yuzhihong@gmail.com> Closes #5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods (cherry picked from commit 54e6fa0) Signed-off-by: Andrew Or <andrew@databricks.com>
|
Test build #32279 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
Author: tedyu <yuzhihong@gmail.com> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
Author: tedyu <yuzhihong@gmail.com> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
Author: tedyu <yuzhihong@gmail.com> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
No description provided.